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COUPLING WITH THE STATIONARY DISTRIBUTION AND 
IMPROVED SAMPLING FOR COLORINGS 
AND INDEPENDENT SETS 

By Thomas P. Hayes 1 and Eric Vigoda 2 

University of California at Berkeley and Georgia Institute of Technology 

We present an improved coupling technique for analyzing the 
mixing time of Markov chains. Using our technique, we simplify and 
extend previous results for sampling colorings and independent sets. 
Our approach uses properties of the stationary distribution to avoid 
worst-case configurations which arise in the traditional approach. 

As an application, we show that for fc/A > 1.764, the Glauber 
dynamics on fc-colorings of a graph on n vertices with maximum 
degree A converges in O(nlogn) steps, assuming A = f2(logn) and 
that the graph is triangle- free. Previously, girth > 5 was needed. 

As a second application, we give a polynomial-time algorithm 
for sampling weighted independent sets from the Gibbs distribution 
of the hard-core lattice gas model at fugacity A < (1 — e)e/A, on 
a regular graph G on n vertices of degree A = f2(logn) and girth 
> 6. The best known algorithm for general graphs currently assumes 
A<2/(A-2). 

1. Introduction. The coupling method is an elementary yet powerful 
technique for bounding the rate of convergence of a Markov chain to its 
stationary distribution. Traditionally, the coupling technique has been a 
standard tool in probability theory (e.g., [1, 4, 14]) and statistical physics 
(e.g., [15]). More recently, it has yielded significant results in theoretical 
computer science [3, 5, 9, 10, 11, 16, 17]. We refine the coupling method, and 
as a consequence, improve and simplify recent results on randomly sampling 
colorings and weighted independent sets. 

Consider a Markov chain on a finite state space that has a unique 
stationary distribution ir. A (one-step) coupling specifies, for every pair of 
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states (X t ,Y t ) efi 2 , a distribution for (X t +i,Y t+ i) such that X t — > X t +i, 
and similarly Yt — ► Yi+i, behave according to the Markov chain. 

Let p denote an arbitrary integer-valued metric on Q, where diam(il) 
denotes the length of the longest path. For e > 0, we say a pair (x,y) € f^ 2 
is e distance- decreasing if there exists a coupling such that 

E(p(X 1 ,Y 1 )\X = x,Y = y)<(l-£)p(x,y). 

The coupling theorem says that if every pair (x,y) is distance-decreasing, 
then the Markov chain mixes rapidly: 

Theorem 1.1 (cf. [1]). Let e > and suppose every (x,y) G SI 2 is e 
distance- decreasing. Let Xq € £1, 5 > and 

r> ln(diam(Q)/^) 
~~ e 

Then \\Xt — vr||xv ^ 

For a pair of distributions /i,i/ona space f2, the variation distance metric 
is defined as 

\\p ~ v\\tv = hJ2 \^ x ) - 

We will often abuse notation, writing \\X — z^||tv for the variation distance 
between the distribution of a random variable X and a distribution v. 

Our first coupling with stationarity theorem does not require every pair of 
states to be distance-decreasing. Instead, we only require that most states x 
be distance-decreasing with every y. 

Theorem 1.2. Let e > 0. Suppose S C Jl sitc/t i/ia£ every £ 5 x $7 
is e distance- decreasing, and 

£ 

^ (5) - 1- 16dia^)- 

Lei Xo £ fi, 5 > and 

r> rin(32diam(n))irin(l/J)^ 

£ 

Then \\Xt — tt||tv < <5- 

We will apply Theorem 1.2 to improve results on randomly sampling 
colorings; see Section 1.1. Theorem 1.2 is proved in Section 3. 

If we only require that most pairs of states be distance-decreasing, we 
can prove rapid mixing under the additional assumption that the initial 
distribution is a "warm start," as defined by Kannan, Lovasz and Simonovitz 
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[13] for random walks in convex bodies. A distribution Xq on 0, is said to 
be a warm start (with respect to ir) if 

forallzeO Pr(X = z) <2vr(z). 

Theorem 1.3. Let e > 0. Suppose S C Q such that every (x,y) G S 2 is 
e distance- decreasing. Let 

e5 , m ln(2diam(fi)/£) 

TTien if Xq is a warm start to tt, 

\\Xt + tt||tv < & 

We will apply Theorem 1.3 to improve results on randomly sampling 
independent sets (see Section 1.2). Theorem 1.3 is proved in Section 3. 

1.1. Randomly sampling colorings. For a graph G = (V,E) with max- 
imum degree A, the Glauber dynamics (heat-bath) is a simple Markov 
chain whose stationary distribution is uniformly distributed over proper k- 
colorings of G. Let ft = [k] v , where [k] = {1,2, ... ,k}. From Xt £ fi, the 
evolution Xt — > Xt+i is defined as follows: 

• Choose v uniformly at random from V. 

• For all id/d, set X t +i(w) = X t (w). 

• Choose Xt+i(v) uniformly at random from [k] \ Xt+±(N(v)), where N(v) 
is the set of neighbors of v. In words, the new color for v is randomly 
chosen from those colors not appearing in the neighborhood of v. 

The latest results on randomly sampling colorings, beginning with [5], use 
the following "burn-in" method for the analysis of the Glauber dynamics. 
After the Markov chain evolves for a sufficient number of steps, the so- 
called burn-in period, the coloring has certain "local uniformity" properties 
with high probability. Moreover, these properties persist for a polynomial 
number of steps. Consequently, to prove rapid mixing it suffices to prove 
there is a distance-decreasing coupling for every pair of states satisfying 
the local uniformity properties. Earlier works, for example, [11], analyze the 
worst-case pair of states and hence rely upon Theorem 1.1. Using the burn- in 
approach led to many significant improvements [5, 9, 10, 16] since it avoids 
the worst-case pair of states in the coupling analysis. 

Proving that the local uniformity properties appear for the Glauber dy- 
namics is very difficult. Roughly speaking, vertex colors are not independent, 
and their correlation has to be bounded. Dyer and Frieze [5] and Molloy [16] 
used the method of "paths of disagreement." Hayes [9] used a more sophis- 
ticated method of "conditional independence." As a by-product of these 
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results, it follows immediately that a uniformly random coloring has the 
local uniformity property with high probability. 

Directly proving that a uniformly random coloring has these local unifor- 
mity properties is much easier than for colorings generated by the Glauber 
dynamics. Our upcoming Theorem 1.4 highlights this simplicity. By "cou- 
pling with stationarity" (Theorem 1.2), we are able to improve the main 
result of [9] with considerably less work than the original. 

Theorem 1.4. Let a = 1.763... denote the solution to x = exp(l/x). 
Let < £ < 1 . Let G be a triangle-free graph on n vertices having maximum 
degree A, let k > max{(l + QaA, 2881n(96n 3 /C)/C 2 } and let X be any It- 
coloring of G. Then for every 5 > 0, after T > 6ra[~ln(32ra)] |~ln(l/<5)]/£ steps 
of the Glauber dynamics, 

\\Xt — vr||xv < & 

Earlier versions of Theorem 1.4 appeared in [5] and [9]. Both needed 
higher girth [fi(logA) and > 5, resp.] and had considerably more difficult 
proofs. The proof of Theorem 1.4 is presented in Section 2. 

Hayes and Vigoda [10] have proved 0{n logre) mixing time for k > (l + e)A 
for all e > 0, assuming A = fi(logn) and girth > 9. Recently, Dyer, Frieze, 
Hayes and Vigoda [6] reduced the condition on A to a sufficiently large 
constant, assuming k/ A > 1.489 . . . and girth > 6. 

Our results for graph colorings are syntactically similar to recent work by 
Goldberg, Martin and Paterson [8], who also examine fc-colorings of triangle- 
free graphs for k/A> 1.763. . . . Their focus is on proving, for random col- 
orings, that correlation between the colors assigned to two vertices decays 
exponentially fast with the distance between the pair. More precisely, they 
are proving a variant of a so-called strong spatial mixing property holds. For 
amenable graphs, strong spatial mixing is closely related to rapid mixing of 
the Glauber dynamics (cf. [7]). 

They prove their version of strong spatial mixing holds for every triangle- 
free graph for k/A > 1.763... for all A. Their proof and our proof utilize 
similar local uniformity properties of the stationary distribution. However, 
in their setting it suffices for the properties to hold in expectation, whereas 
in the analysis of the dynamics it appears essential for the properties to 
hold with high probability. Contrasting our results and proofs with theirs 
highlights the differences between strong spatial mixing and rapid mixing of 
the Glauber dynamics for general graphs. 

1.2. Randomly sampling independent sets. Given a graph G = (V, E) and 
a fugacity A > 0, the hard-core lattice gas model (see [2]) is defined on the set 
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0, of independent sets of G. The weight of X C V, where X 6 0, is defined 

as 

w{X) = \W. 

We are interested in sampling from the (Boltzmann) Gibbs distribution 7r 
on f2 where tt(X) = w(X)/Z and 

Z = Z(G,X)=J2 W ( X ) 
xen 

is the partition function. 

As with colorings, the Glauber dynamics for the hard-core model updates 
a random vertex at each step. From Xt £ f2, the transition Xt — ► Xt+\ is 
defined by: 

• Choose a vertex v uniformly at random from V. 

• Set 

^, _ ( X t U v, with probability A/ (1 + A), 
\ X t \ v, with probability 1/(1 + A). 

• If X' £ f2, set Xt + \ = X'; otherwise set Xt + \ = Xt. 

It is clear that the Glauber dynamics is reversible and ergodic, and the 
unique stationary distribution is tt. 

The latest result for general graphs is 0{n log n) mixing time for A < 
2/(A — 2) by Vigoda [18]. It is widely believed the chain mixes rapidly for 
all 

Applying Theorem 1.3, we prove there exists an efficient algorithm which 
reaches the above threshold for large-degree regular graphs with girth at 
least 6. To guarantee the warm-start condition, we use a simulated annealing 
algorithm similar to Jerrum, Sinclair and Vigoda's algorithm for estimating 
the permanent [12]. 

Theorem 1.5. For all e > 0, there exists C > such that for every 
5 > the following holds. Let G be a A-regular graph on n vertices, where 
girth(G)>6, and A > Clog(?7,/5). Let A < (1 — e)e/A. Then there is an algo- 
rithm which outputs a random independent set ofG within variation distance 
5 of the Gibbs distribution at fugacity X, with running time polynomial in 
n,l/e and log(l/5). 

Theorem 1.5 is proved in Section 4. The degree restriction prevents us 
from boosting the above sampling scheme to arbitrarily close distances to 
the stationary distribution, and also from applying the standard reduction 
from approximating the partition function to sampling from the Gibbs dis- 
tribution. 
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2. Sampling colorings. 

2.1. Local uniformity property. We will exploit a nice "local uniformity" 
property of the uniform distribution on proper fc-colorings. This property 
was first used by Dyer and Frieze [5] to improve rapid mixing results of the 
Glauber dynamics. 

For any triangle-free graph we will show an easy lower bound on the ex- 
pected number of available colors for an arbitrary vertex v, where "available 
colors" refers to those colors not appearing in the neighborhood of v. When 
k = Q (log n), we can even prove that, with high probability, every vertex has 
essentially this many available colors. 

For any coloring X which satisfies our lower bound on available colors, 
it is easy to show that for every coloring Y, the pair (X, Y) is distance- 
decreasing under the natural "greedy" one-step coupling of Jerrum [11]. 
This allows us to apply our first "coupling with stationarity" Theorem 1.2 
to prove Theorem 1.4. 

Throughout, we will use the notation 

A(X,v) := [k]\X(N(v)) 

to denote the set of available colors for a vertex v under a /c-coloring X [here 
N(v) := {w € V\w ~ v} denotes the set of neighbors of vertex v]. 

Lemma 2.1. Let G= (V,E) be a triangle-free graph with maximum de- 
gree A, let < [3 < 1 and k > A + 2/(5. Let X be a random k-coloring of G. 
Then 

Pr(3 v G V, \A(X,v)\ < k(e~ A/k - 0)) < ne"^^ 8 . 

PROOF. Let v G V. By definition, 

(1) L4(A»|=£ J] (!-^>)» 

je[k] weN(v) 

where Xj tW is the indicator variable for the event {X(w) =j}. 

Henceforth, we will condition on the values of X on V\N(v); denote this 
conditional information by T . Conditioned on T, since G is triangle- free, 
the random variables X(w),w 6 N(v), are fully independent and each X(w) 
is uniform over the set A(X,w). This allows us to write 

B(\A(X,v)\\T)=J2 II (l-E(X^)) 

je[k] weN(v) 

= S 5/ . V 1 ~ \A{X,w) 

A(X,w)3j 
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Applying the arithmetic-geometric mean inequality, this implies 

E(\A(X,vW)>kU n { l -TA(hvy) 1,k 
je[k] weN(v) V \A(^,w)\/ 

A(X,w)3j 

/ 1 \ 1 / fe 

=* n n (i-nnhoi) 

/ 1 v |A(X,w)|/fc 

>, n (-i-vi^w)^'^ 

where the last step follows from the inequality (1 — p) 1 ^ > (1 —p)/e, which 
holds for all < p < 1. Since we are assuming k > A + 2//3, it follows that 

B(\A(X,v)\\F)>ke-^ k (l-^ 



Since the colors X(w),w € N(v), are fully independent, conditioned on 
J-, and since w)\ is a Lipschitz function of these colors with Lipschitz 

constant 1, it follows by Chernoff's bounds that 

Pt(\A(X,v)\ < k(e~ A/k - 13)) < e"^ 2 ^ 8 . 
Taking a union bound over v € V completes the proof. □ 



Remark 2.2. A stronger form of Lemma 2.1 was proved by Hayes [9], 
who replaced the assumption k> A + 2/ (3 by k> A + 2 with slightly worse 
constants in the error probability bound. 



2.2. Most colorings are distance- decreasing. We now present a simple 
sufficient condition for a pair of colorings to be distance-decreasing. We 
use Jerrum's one-step coupling of the Glauber dynamics on /c-colorings; 
see [11]. Each chain chooses the same vertex to recolor at every step. We 
then maximize the probability the chains choose the same new color for 
the updated vertex. Under this coupling, for the purposes of proving rapid 
mixing it suffices for one of the chains to have the local uniformity property 
considered in Lemma 2.1. 
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Lemma 2.3. Let < j3 < 1, and suppose Xsfi satisfies, for every v £ 
V(G), 

Then, for every Y € fi, i/ie pair (X, Y) is /3/n distance- decreasing. 

PROOF. We need to prove, for every Fefi, 

E(p(X 1 ,Y 1 )\X = X,Y = Y) <(l- ^\p(X,Y), 

where in this case p denotes Hamming distance on graph colorings. Let us fix 
a coloring Fgfl, and condition throughout on the event {Xq = X, Yq = Y}. 

Let v denote the vertex randomly selected for recoloring at time 1. Jer- 
rum's coupling maximizes the probability the chains choose the same color 
for v. For a color c available to v in both chains, that is, c € A(X, v) D 
A(Y,v), we simultaneously color v with c in both X and Y with probability 
min{l/|A(^", v)\, 1/\A(Y, v)\}. With the remaining probabilities the coupled 
color choices are arbitrary. Hence, 

Pr(X 1 (v)=Y l (v)=c\v) 



max{\A(X,v)\,\A(Y,v)\y 



We now bound the probability the chains recolor v to a different color in 
the two chains, 



max{\A(X,v)\,\A(Y,v)\} 

max{\A(X,v)\, \A(Y,v)\} - \A(X,v) D A(Y,v)\ 
max{\A(X,v)\,\A(Y,v)\} 

Hence, 

Iv , x ,,,, m x |{«eiV(v)|X(ti)^y(«)}| 

(2) p ^w^.wi») ^L { ii ( x,4iJIy,;)i} 

Finally, we bound the expected distance after the transition, 
B(p(X 1 ,Y 1 ))=J2 Pr(X 1 (w)^Y 1 (w)) 

w&V 

= Fr (v + w and X(w) ^ Y(w)) 
w&V 

+ Y Fr ( v = w and x i( w ) Y i( w )) 

wev 
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p(X,Y)+ Pr (" = » and X 1 (w)^Y l (w)) 



n 



< n ~ 1 n (x yulv |{^£Ar( w )|x(u)^r( M )}| 

- n > + n^ v m^{\A(X,w)\M(Y,w)\} 



n ' y ' ' n A/(l -P) 

l-P)p(X,Y). 
n J 



The first inequality above holds by (2), and the second inequality uses the 
fact that A is the maximum degree and our assumption that |^4(^, w)\ > 
A/(l-/3) for all weV. □ 

We now present the proof of Theorem 1.4. 

PROOF of Theorem 1.4. Define (3 = (/6. Since by hypothesis, k > 
(1 + C)aA > 3A/2 and k > 2881n(96?i 3 /C)/C 2 = 81n(16?i 3 / (3) / (3 2 > 8/0, it 
follows that either A > 4//?, and so k > 3A/2 > A + 2/(3, or A < 4//?, and 
so k> 8/P > A + 2/(3. Thus in either case k > A + 2//3. Let 

S = {X G ft : (Vu G > A;(e- A / fc - /?)]}. 

Lemma 2.1 together with the hypothesis A; > 81n(16n 3 //3)//3 2 and the fact 
diam(ft) = n, imply 

vr(5) > 1 - ne"^ 8 

/5 



> 1 
= 1 



16n 2 

£ 



16diam(r2) ' 
where e = f3/n. 

Recalling that a = 1.763 . . . < 2 and that < Q = 6(3 < 1, it can be verified 
by elementary algebra that 

(l + C)(l-/3)(l-«/?)>l, 

and hence that 

A k 

< 



l-/3-a(l + 0(1-0) 

= A;(exp(— 1/a) — (3) by definition of a 
<k(exp(-A/k)- (3). 
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It follows that, by Lemma 2.3, every pair (X,Y) € S x Q is e distance- 
decreasing, where e = [3/n. Applying Theorem 1.2 yields the desired result. 
□ 

3. Coupling with stationarity. In this section we will prove our coupling 
with stationarity theorems (Theorems 1.2 and 1.3). These will follow as 
corollaries of the following more general theorem about couplings which 
"usually" decrease distances. For an event G, let lg denote the indicator 
variable whose value is 1 if Q occurs and otherwise. 

Theorem 3.1. Let Xo, . . . ,Xt,Yq, . . . ,Yp be coupled Markov chains such 
that, for every < t < T — 1, 

Pr((Xt,Yt) is not e distance- decreasing) < 5. 

Then 

Pr(X T / Y T ) < ((1 - e) T + 5/e) diam(fi). 

Proof. For < t < T - 1, let G{t) denote the event that (X t ,Y t ) is e 
distance-decreasing. Observe that 

E(p(X t+1 ,Y t+1 )-(l-s)p(X t ,Y t )) 

= B((p(X t+1 ,Y t+1 ) - (l-e)p{X u Y t ))l g{t) ) 
+ E((p(X t+1 ,Y t+1 ) - (l-e)p(X t ,Y t ))l m ) 

< B((p(X t+1 ,Y t+1 ) - (l-e)p(X t ,Y t ))l m ) 
<5B(p(X t+1 ,Y t+1 )) 

< 5diam(f2). 
This can be rewritten as 

(3) E(p(X t+1 ,Y t+1 )) < (1 - e)E(p(X t ,Y t )) + 5diam(0). 

Hence, 

B(p(X 1 ,Y 1 )) < (1 - e)E(p{X ,Y )) + 5diam(^) 
<(l-e + S) diam(fi). 
And, for all t>0, from (3), it follows by induction that 

E(p(X t ,Y t )) < ((1 - e)* + 5(1 - (1 - £)')/£) diam(fi) 
< ((l-e)* + (5/e)diam(n). 
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Since p is integer- valued, we obtain the desired conclusion by Markov's 
inequality: 

Pr(X T + Y T ) = Pr(p(X T , Y T )>1) 
<E(p(X T ,Y T )) 

<{{l-e) T + 5/e)diam(tt). □ 

We now present the proofs of our coupling with stationarity theorems. 

PROOF of Theorem 1.2. Let X be arbitrary and let Y be distributed 
according to ir. Generate X\, . . . ,Xp,Yi, . . . ,Yp using the given coupling 
with initial states Xo,Yq. Note that, for every t > 0, Yt is distributed ac- 
cording to 7r, and so, since every element of Q x S is e distance-decreasing, 

Pr((X t ,Y t ) is not e distance-decreasing) < Pr(Y t £ S) 

= 1-tt(S). 

Let T 1 = [ln(32diam(0))/e]. We first show that after T' steps, we are 
within distance 1/8 of stationarity. Then the standard boosting argument 
implies that after T = T' |~ln(l/<5)] steps, we are within distance 5 of station- 
arity. 

Applying Theorem 3.1, and noting again that Yp' ~ n, we have 
\\X T i - n\\ < Pr(X T > / Y T >) 

< ((1 - ef + (1 - ir(S))/e) diam(O) 

4 exp( - Er ' ) + lMi^M) di '' m(!! ) 

< 1/8. 

Since the above holds for all Xq £ f2, the triangle inequality implies that for 
all pairs of states Wo,Zo G fi 2 , 

\\Wt> - Z t >\\tv < 1/4 < 1/e. 

Since there always exists a T'-step coupling which achieves the variation 
distance, the above can be boosted as follows (see [1]). We consider the 
T-step coupling generated by concatenating this T'-step coupling |~ln(l/(5)] 
times. Then 

\\Wt — tt||tv < maxPr(VF"r ^ Zt) 

[ln(l/5)l 

< J] Vr{W lT ,^Z lT ,\W {i _ l)T ,^Z {l _ 1)T ,) 
i=i 

<(l/e)WW<6. 
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□ 

Proof of Theorem 1.3. Let Xq be a warm start to tt, and let Yq ~ tt. 
It follows from the definition of stationarity that Yj is distributed according 
to tt for all t > 0. We observe further that Xt is a warm start for all t > 
since, assuming Xt~\ is a warm start, we have, for every x € 0, 

Prpft = a?) = ]T Pr(AVi = a/)P(a/, x) 
x'ea 

< J2 2n(x')P(x',x) 
= 2tt(x). 

Since every element of S x S is e distance-decreasing, it follows that 
Pr((X t , Y t ) is not e distance-decreasing) < Pr(A" 4 £ S) + Pr(Y t <£ S) 

< 3(1-tt(5)). 

Applying Theorem 3.1, and noting again that Yp ~ vr, we have 
||X t -7t||tv <Pr(X T /y T ) 

<((l- £ f + 3(1 ~ g 7r(5)) )diam(Q) 
<*, 

for T > ln(2 diam(fi) /<5) /e. □ 

4. Independent sets. In this section we will prove Theorem 1.5, present- 
ing an algorithm based on simulated annealing, which allows sampling from 
the Gibbs distribution for the hard-core model at fugacity A, approaching the 
(believed) critical threshold A = e/A. Our algorithm, which we will present 
shortly, relies on the efficient convergence of the Glauber dynamics, given 
a warm start. More formally, we will require the following result, which is 
analogous to Theorem 1.4 for graph colorings. 

Lemma 4.1. Let £, <5 > 0. Let G be a /^.-regular graph on n vertices 
having girth at least 6, where A > 320000 ln(144n 3 /C<5)/C 4 , and let A < 
(1 — QeJ 'A and T > 8nln(2n/S)/e. Let Xq be a warm start to n, the Gibbs 
distribution for the hard-core lattice gas model at fugacity A. Then after T 
steps of the Glauber dynamics, 

\\Xt — tt||tv < & 

We first present the algorithm of Theorem 1.5, which utilizes Lemma 4.1. 
We then prove Lemma 4.1. 
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Simulated annealing algorithm. Let A > be given. Since the Glauber 
dynamics mixes in time 0(n log n) whenever A < 2/(A — 2), we assume 
without loss of generality that A > 2/(A — 2) > 1/A > l/3n. Define a se- 
quence Ao < Ai < • • • < Afe, by Ao = 0, A& = A, and for 1 < i < k — 1, Aj = 
(H-l/3n) i_1 /3n, where fc = [log(3nA)/log(l + l/3n)] . 

We use the following simulated annealing algorithm. Let Yq be the empty 
set. For 1 < i < k, simulate the Glauber dynamics at fugacity Aj for Tj = 
O(nlogn) steps, starting from initial state and let Yj be the state 

reached after T% steps. Let the constant hidden in the notation be large 
enough that Lemma 4.1 guarantees that Glauber dynamics mixes to within 
5/ k of the Gibbs distribution at fugacity Aj, within T steps, from any warm 
start. Output the final state Yfc. 

The proof of correctness, presented in the next section, relies on showing 
that, for 1 < i < k, the Gibbs distribution at fugacity Aj_i is a warm start 
to the Gibbs distribution at fugacity Aj. Once this has been established, 
Lemma 4.1 will complete the proof. 

Proof of Theorem 1.5. Assume without loss of generality that A > 
1/A > 1/n. Note, for A < 1/A there is a straightforward coupling argument 
which proves the Glauber dynamics is close to its stationary distribution 
after (3(n log ra) steps; see [18] for a more complicated argument when A < 
2/(A-2). 

First, we prove that, for 1 < i < k, the Gibbs distribution at fugacity Aj_i 
is a warm start to the Gibbs distribution at fugacity Aj. For each 1 < i < k, 
define the "partition function" Z% by 

o-efi 

It is clear that the desired warm-start condition is equivalent to Zj < 2Zj_i. 
We handle the case i = 1 separately: 

^ = £^'< £ (^)Al=(l + i-) n <e 1 /3 <2 = 2 z . 

For 2 < i < k, we use the fact that Aj < (1 + l/3n)Aj_x- From this it imme- 
diately follows that 

This establishes the warm-start condition. 

Now, for < i < k, let 7Tj denote the Gibbs distribution for fugacity Aj. We 
now prove by induction on i that ||Yj — 7Tj||TV <iS/k, which will complete 
the proof. 
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The base case i = is trivial. Let i > 1, and suppose by inductive hypoth- 
esis, — 7Tj_i||TV — (* — l)$/k- To understand the distribution of Yi, let 
T = Ti, and let us examine a T-step coupling of two copies of the Glauber 
dynamics at fugacity Aj. Sample Aq from the distribution of and Bq 
according to 7Tj_i; couple these distributions so that 

Pr(A + B ) = \\Y - TTi-illTv < (» - 1)*/*!. 

Sample Ai,Bi,...,At,Bt using a maximal coupling of the dynamics at 
fugacity Aj. Since I?o was a warm start, Lemma 4.1 tells us \\Bt — tTiIItv < 
5//c. Now the triangle inequality tells us 

\\Yi - 7Ti||Tv = \\At - vri||Tv < Pr(ylr / -Bt) + ||JBt - ttiIItv 

<Pr(^ ^So) + ||S T -7ri|| T v<^A- n 

4.1. Local uniformity. The main result of this section is a local unifor- 
mity property of random independent sets. As before, we assume the graph 
is A-regular and the fugacity, A, is less than e/A. For convenience, we will 
also assume A > 1/ A; we do not think this condition is necessary, but it will 
simplify the proof of one of our technical results, Lemma 4.6. Stated roughly, 
the uniformity property is that every vertex has about the same number of 
"unblocked" neighbors, by which we mean neighbors which could be added 
to the independent set without violating the independence condition, and 
moreover that this is a fairly small fraction of A. In Section 4.2 we will show 
that any pair of independent sets with this uniformity property is distance- 
decreasing. These two results, together with Theorem 1.3, will be the key 
ingredients in the proof of Lemma 4.1, given in Section 4.3. 

We will use the following notation. For an independent set icy and 
vertex v, let 

U{X, v) := {w G N(v) :XnN*(w) = 0}, 

where 

N*{w) = N*(w) = N(w) \ {v}. 

Thus, U(X, v) denotes the set of neighbors of v that are unblocked in X\{v}. 
For real numbers a, b with b > 0, we will use the shorthand a ± b to denote 
the interval [a — b, a + b] . 

Lemma 4.2. Let ( > 0, let G = {V, E) be a ^.-regular graph of girth > 6 
and let 1/A < A < (1 — Qe/A. Let fi be the solution to fx = exp(— fxXA). Let 
X C V be a random independent set drawn from the Gibbs distribution tt at 
fugacity A. Then for every £ > 0, 

Pt((Vv£V)\U(X,v)\ € (/x±f)A) 

eC (e+l) 2 \ 2 A\ 
8AA A J 8 J' 



> 1 — 3n exp 
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In particular, for any fixed values C>£>^ > 0, there exists C > such that 
A > C log n implies 

Pr((VveV)\U(X,v)\ g (/i±0 A ) > 1 - <V™ 2 - 

We conjecture that a similar uniformity property should be true for non- 
regular graphs of maximum degree A; however, in this setting, [i would not 
be a constant, but rather a function of the vertex v. If so, the rest of our 
results would also extend easily to this context. 

Our proof of Lemma 4.2 can be modified to give nontrivial bounds even 
when A = 0(1). In this range, one must avoid taking union bounds over the 
vertex set. However, our proof of Lemma 4.1 requires such a union bound in 
another step, so we have focused on the case A = fi(logn), which simplifies 
our results and proofs. 

The first tool for our proof of Lemma 4.2 is a "bootstrapping" mechanism, 
to convert local recurrences for functions on V into absolute bounds. 

We will use the following notational conventions. 

Definition 4.3. Let J denote the set of all real intervals, [a, b\. For 
any set S C M, let S ± £ denote the set S + [—£,£] = {x + y \ x € S,\y\ < £}. 

Lemma 4.4. Let G = (V, E) be a graph, and let f :V — > [0, 1] be a random 
function ( from any distribution over [0, l] v ). Let 9,ip>0, letg-.R^R and 
define /i : — > by h(I) = g(I) ± 9. Suppose that for every vertex v, 

Then 

Pr((Vv G V, Vt > 1) f{v) G h\[Q, 1])) > 1 - ny. 
Proof. A union bound over v G V shows that 
(4) Pr hw ve V)f(v)e^ £ g(f(w))±e\>l-nip. 

\ w€N(v) / 

Suppose this event holds, but that there exists some vertex z and positive 
integer t such that f(z) £ h ([0,1]). Without loss of generality, choose z 
so that t is as small as possible. Then in particular, for all w ~ z, f(w) G 
/i'~ 1 ([0, 1]). But in this case, specializing (4) to vertex z implies 

/(*0e~£*Mctf([O,l]), 

a contradiction. □ 
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The next step in proving Lemma 4.2 is the following easy observation, 
which we will use again in Section 4.3. 

Observation 4.5. Let ( e (0, 1), let C = (1 — £)e and let [i = exp(-C^). 
Then 

M <(l-C/2)/C. 

Proof. Let /(» = exp(-Cx) and let y = (1 - £/2)/C. Then 
/(y) _ exp(C/2) 

(l-Qexp(C/2) 
l-C/2 
<(l-C/2)exp(C/2)<l, 

where the last inequality is since 1 — x < exp(— x) for all i/0. Thus f(y) < y, 
which implies y > fj,, since / is a decreasing function with fixed point fi. □ 

Our next result is a strong sort of convergence for iterated applications of 
the mapping x i— > exp(— Cx), where 1 < C < e. It says that, even permitting 
a small adversarial perturbation after every step, every trajectory under the 
iterated mapping quickly converges to a small interval around the unique 
fixed point fi = exp(— C/i). 

Lemma 4.6. Let 1 < C < (1 — £)e and £ > 0. Let y(x) = exp(-Cx), Zei 
fi be the unique fixed point of g and set 6 = ££/8C. Let h(I) = g(I) ± 9, and 
lett= [|ln(l + |)]. Then 

^([o,i])c M ±e 

Remark 4.7. The assumption C > 1 is only for convenience in our 
proof; since rapid mixing is already known when C < 2, there was no partic- 
ular reason to handle the case < C < 1. However, the assumption C < e is 
necessary. Indeed, for C > e, the unique fixed point for x \— > exp(-Cx) is un- 
stable, and from any starting point xo ^ fJ,, the sequence g l (xo) approaches 
a fixed cycle of period 2 (also unique). 

Proof of Lemma 4.6. Without loss of generality, £j£ < 1; otherwise 
there is nothing to prove. Define J : [0, 1] — ► by 

f x ln(l — x)~ 
I{x)= fi — , 



(6) fix + 9< 
and 

(7) fi(e x -l) + 6< 
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where we adopt the convention that 1(1) = [fj, — 1/C, oo). By the definition 
of h, since g is decreasing, and since \i is the fixed point of g, we have 

h(I(x)) = [fi{l-x)-8,fie x + 9}. 

Next we will prove that, whenever x>£/2, 

(5) h(I(x))cI(x(l-(/4)). 

This is equivalent to checking that 

s(l -C/4) 
C 

and 

ln(l - a(l - C/4)) 
C 

Since a; > £/2, we have = < (x/4C. By Observation 4.5, we also 

know that /i < (1 — C,/2)/C. This allows us to deduce the first desired in- 
equality, thus 

,x + e< ^~^ x + ^ 

_ x(l - C/4) 
C 

To deduce the second inequality, we make the same substitutions for \x and 
6; then we show that the inequality holds termwise for the Taylor expansions 
around the origin: 

+ Me*-l)<g + ^%*-l) 

(x (1-C/2) V ^ 
4C + C ^ j! 

_ (l-C/4)x (l-C/2) v ^ 

C C ^ 7 1 

i>2 

(i - c/4)x i v (i - c/4)jv 

c " cj^ j 

i v (i - c/4pv 

-ln(l-x(l-C/4)) 
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Thus we have established h(I(x)) C I(x(l — £/4)) whenever x > £/2. It fol- 



C£/(l + CO- Then h k (I(l)) C I(C£/(1 + <7£)). By another application of 
Observation 4.5, we may deduce [0, 1] C 1(1) = \p — 1/(7, oo). Also, elemen- 
tary algebra implies I(C£/(1 + (7£)) C /U ± £. It follows by monotonicity of 
ft that /i fc ([0, 1]) C h k (I(l)) C J(Cf/(l + CO) C /x ± £. 
Solving for /c, we find 



which completes the proof. □ 

Now we are equipped to prove Lemma 4.2. 

Proof of Lemma 4.2. Fix a vertex v € V. For each neighbor w of «, 
let denote the indicator variable for the event that X n N*(w) = 0. Then 



Now, for each neighbor w, let us compute the conditional expectation of 
Y w given X\N*(w), that is, given X Ci (V \ N* (w)) . By definition, Y w equals 
1 iff no neighbor of w is in X, except possibly v. Hence if w & X , we have 




k 



Tn(CC/(l + CQ) - 
ln(l-C/4) 



< 1 ln((l + CO/CO 

< lln(l + l/0 , 



\U(X,v)\ = Y^Y w . 



B(Y w \X\N*(w)) = l. 



If w £ X, then 



E(iyx\iv*(™)) 



n va+A) 



ze£/(-X»\M 



(l + A)-|tf(*.«')\MI 
(e~ A ,e~ A+A2 ) |c/(x ' u,)Ul ' }l 

( 1)e A 2 |C/(X,«,)\M|^-A|C/(X,«;)\M| 
( lie A 2 (A-l)^-A|C/(A',« ) )\M| 
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C (l :e eX ) e - x \U(X,w)\{v}\ 

c (l )e (e+i)A )e -W,™)| 

c e -X\U(X,w)\ ± ( e (e+l)A _ jn 

assuming e^ e+1 ^ A < e^ e+1 ^ e / A < 1 + (e 2 + e + 1)/A, which is true whenever 
A > 100. Since the desired upper bound on probability is trivial for A < 750, 
the above assumption may be made without loss of generality. 
Let S2(v) denote the set of vertices at distance 2 from v, that is, 

S 2 (v)= |J N*(w). 

wGN(v) 

Applying linearity of expectation and then averaging, we have 

B(\U(X,v)\\X\S 2 (v)) 
A 

= J2V(Y w \X\S 2 (v)) 



e\XnN(v)\+[ V exp(-X\U(X,w)\) ±(e 2 + e+l). 




Since the girth is at least 6, there are no edges between vertices in S 2 (v). 
Hence, conditioned on X \ S 2 (v), the random variables Y w are fully inde- 
pendent, and take values in [0,1]. It follows by Chernoff's bound that, for 
all V > 0, 

Pr( \U(X,v)\ i \XHN(v)\ 



(8) + exp(-A|f/(X,u;)|)±(e 2 + e + l + ^A/2) 

weN(v)\X ) 

< 2exp(-V> 2 A/8). 
If v G X, we have \X n N(v)\ = 0. When v X, since A < e/A, 

B(\XDN(v)\\X\N(v)) <A^— <e. 

1 + A 

Apply Chernoff's bound again, this time to the sum of indicator variables 
for the events {w £ X}, where w is a neighbor of v. These events are condi- 
tionally independent, given X\N(v), and so, since the uniform upper bound 
of e applies to the conditional expectation, Chernoff's bound implies, for all 

^>0, 

(9) Pt(\X n N(v)\ > e + V>A/2) < exp(-^ 2 A/8). 
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Combining (8) and (9), we have, for all tp > 



Pr \U(X,v)\ i exp(-\\U{X,w)\)±((e + l) 2 + 4>A) < 3exp(-^ 2 A/8). 



Applying Lemmas 4.4 and 4.6 with parameters f(v) = \U(X,v)\/ A, C = 
AA, g{x) = exp(-Cx), 6 = £C/8C, h = g±9 and ip = 3exp(-((9 - (e + 
l) 2 /A) 2 A/8), there must exist some t > 1 such that 



Pr((VveV)\U(X,v)\ i 0±£) A ) <Pr{(Vv£V) \U(X,v)\ $ &*([0, 1])) 



4.2. Convergence of the coupling. We now show that pairs (X, Y) are 
distance-decreasing, so long as no vertex of G has too many unblocked neigh- 
bors with respect to either set. 

Lemma 4.8. Let G be a graph, and consider the Glauber dynamics for 
independent sets, with fugacity A. Let X and Y be independent sets and 
suppose there exists C > such that for every vertex v, 



Then the pair (X,Y) is C/n distance- decreasing. 

Proof. Let X',Y' be the new sets obtained after doing one step of 
Glauber dynamics starting from (X, Y), using a maximal coupling. More 
explicitly, select a uniformly random vertex v* for update. If the neighbor- 
hood of v* is disjoint from both X and Y, then with probability A/(l + A), 
set X' = XU{v*} and Y' = Yl){v*}, and otherwise set X' = XandY' = Y. 
If the neighborhood of v* is disjoint from exactly one of X or Y, add v* to 
the corresponding new set X' or Y' with probability A/(l + A). Otherwise, 
make no change. 

Let D = {v:X(v)^Y(v)}, and D' = {v : X'(v) ± Y' («)}. Then, letting p 
denote Hamming distance, we have 



< mp. 



□ 



\U(X,v)\,\U(Y,v)\<(l-C) 



1 + A 



A 



B(p(X',Y'))-p(X,Y) 



= E(\D'\)-\D 



= -Pr(V G D) + Pr(y € D') 



<- Py - } + Y, Vr(v* =v &ndX'{v)^Y'(v)) 



weDveN(w) 




n(l + A) 



A 
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-Cp(x,y) 

n 

where the last inequality follows by the hypothesis of the lemma. □ 

4.3. Proof of Lemma 4.1. We are now ready to prove Lemma 4.1, estab- 
lishing rapid mixing of the Glauber dynamics from a warm start. 

Proof of Lemma 4.1. Let \i = exp(— /xAA). Let S denote the set of 
independent sets X such that for every vertex v, 

\U(X,v)\<(jm + C/8)A. 

Applying Lemma 4.2 with £ = £/8, then simplifying using our assumptions 
AA < e and A > 320000 ln(144n 3 /C<5)/C 4 , we obtain 

( -C 4 A \ 

-^^^Is^ooooJ 

>1 <* 



48n 2 ' 

Let X £ S, and let v be any vertex. Applying Observation 4.5 to the fixed 
point fx (with C = AA), and recalling our hypothesis that A < e/A, 

|[/(A»|<(^ + C/8)A 

1 - C/2 + CAA/8 
A 

< W/8_ 
A 

Hence, by Lemma 4.8, every pair (X, Y) E S x 5 is C/8« distance-decreasing. 

The desired result now follows by applying the coupling with stationarity 
Theorem 1.3 to the set S. □ 
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